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Abstract 

Capture  the  Flag  (CTF)  is  a  popular  computer  security 
exercise  in  which  teams  compete  one  against  the  other 
to  attack  and/or  defend  programs  in  real  time.  CTFs  are 
currently  expensive  to  build  and  run:  each  is  a  bespoke 
affair,  with  challenges  and  vulnerabilities  crafted  by  ex¬ 
perts.  This  limits  both  educational  value  for  players  and 
what  researchers  can  learn  from  them  about  the  human 
activities  such  as  vulnerability  discovery  and  exploita¬ 
tion.  In  this  work,  we  take  steps  towards  making  CTFs 
cheap  and  reusable  by  extending  our  LAVA  bug  injec¬ 
tion  system  to  add  exploitable  vulnerabilities,  enabling 
rapid  generation  of  new  CTF  challenges.  New  LAVA 
bug  types,  including  a  memory  corruption  and  an  ad¬ 
dress  disclosure,  form  a  sufficient  set  of  primitives  for 
program  exploitation  in  most  cases.  We  used  these  tech¬ 
niques  to  create  AutoCTF,  a  week-long  event  involving 
teams  from  four  universities.  For  evaluation,  we  con¬ 
ducted  surveys  and  semi-structured  interviews  after  the 
event  to  understand  how  AutoCTF  differed  from  a  hand¬ 
made  CTF,  assessing  not  only  challenge  realism  and  dif¬ 
ficulty  but  also  the  relative  effort  expended  on  bug  find¬ 
ing  and  exploit  development.  Our  preliminary  results  in¬ 
dicate  that  AutoCTF  can  form  the  basis  for  cost-effective 
and  reusable  CTFs,  allowing  them  to  be  run  often  and 
easily  to  train  new  generations  of  security  researchers  as 
well  as  provide  empirical  data  on  human  vulnerability 
discovery  and  exploit  development. 

1  Introduction 

There  are  over  one  hundred  active  CTFs  listed  on  CTF 
website  ctftime.org  a  testament  to  the  popularity  of 
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the  activity.  Despite  the  fact  that  one  could  apparently 
play  in  two  a  week  given  this  abundance,  we  would  argue 
this  is  too  few.  One  problem  is  variance.  No  two  events 
are  alike,  with  different  flavor,  emphasis,  and  quality. 
This  means  it  is  difficult  to  use  them  to  train  in  a  fo¬ 
cused  area.  Another  issue  is  that  CTF  contests  are  high- 
profile  events  and  are  partly  interpreted  as  a  showcase  of 
the  organizers’  talents.  Thus,  challenge  writers  are  un¬ 
derstandably  biased  towards  creating  something  totally 
new — on  our  past  CTF  teams,  the  water-cooler  or  bar¬ 
room  conversation  after  an  event  has  indeed  mostly  fo¬ 
cused  on  the  novelty  of  the  puzzles.  These  forces  seem 
at  odds  with  educational  goals,  which  require  repeated 
practice  of  core  skills. 

Nevertheless,  CTFs  are  touted  as  potentially  powerful 
education  and  training  vehicles  Go]  El  mi  mm.  We 
hypothesize,  perhaps  controversially,  that  the  top  CTFs 
(DEF  CON,  Boston  Key  Party,  PlaidCTF,  etc.)  might  not 
actually  be  very  educationally  useful.  Rather,  they  are 
built  to  evaluate  the  relative  performance  of  CTF  teams 
and  players.  That  is,  CTFs  are  baseball  games,  with  DEF 
CON  finals  as  the  World  Series.  But  there  isn’t  currently 
a  clear  CTF  analogy  to  Spring  Training  or  even  the  reg¬ 
ular  practices  of  a  high  school  baseball  team. 

Our  aim  is  to  fill  this  gap  with  a  kind  of  CTF  that  is 
cheap  to  run  and  re-run  and  that  is  easily  configurable 
with  respect  to  both  difficulty  and  topic.  These  CTFs  will 
unabashedly  reuse  the  same  base  applications  over  and 
over  again  to  focus  attention  on  vulnerability  discovery 
and  exploitation.  We  make  this  choice  because  it  allows 
for  reuse,  but  we  note  that  it  can  also  be  justified  on  re¬ 
alism  grounds:  practical  vulnerability  discovery  mainly 
deals  in  established  programs  like  Firefox  and  OpenSSL 
which  have  been  around  for  years  and  see  frequent  up¬ 
dates. 

We  present  AutoCTF,  a  first  step  toward  reusable,  au¬ 
tomatically  generated  CTFs.  The  basic  idea  of  AutoCTF 
is  to  assemble  a  stockpile  of  applications  into  which  we 
can  repeatedly  inject  a  handful  of  exploitable  bugs  of 
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known  types  to  create  a  sequence  of  fresh  CTF  com¬ 
petitions.  The  bugs  will  be  in  different  places  and  thus 
both  vulnerability  discovery  and  exploitation  will  re¬ 
quire  new  effort  and  even  different  knowledge  and  tech¬ 
niques.  We  use  the  LAVA  bug  injection  system  CD, 
developed  under  previous  work,  extending  it  to  be  able 
to  insert  exploitable  bugs  of  a  few  key  types.  These 
bugs  were  added  to  two  base  programs,  one  a  souped- 
up  echo  server,  and  the  other  a  simplified  stack  based 
interpreter.  This  resulted  in  four  auto-generated  chal¬ 
lenge  programs  which  we  supplemented  with  four  ver¬ 
sions  with  manually-inserted  bugs.  AutoCTF  ran  for  a 
week,  during  which  new  challenges  were  made  available 
each  day  to  teams  from  four  competing  universities.  The 
task  was  to  figure  out  how  to  exploit  the  buggy  program 
in  order  to  exfiltrate  a  flag  from  a  known  place  on  the  file 
system.  Four  of  the  eight  challenges  were  solved  by  the 
top  scoring  team. 

2  Background 
2.1  Capture  the  Flag 

Capture  the  Flag  competitions  have  been  a  popular  event 
for  over  20  years.  In  the  most  general  terms,  a  CTF  is  a 
competition  in  which  teams  or  individuals  compete  to  ac¬ 
complish  some  security-relevant  goal;  upon  accomplish¬ 
ing  that  goal  they  receive  a  flag  (usually  a  hard-to-guess 
string)  that  can  be  submitted  as  proof  to  the  competition 
organizers. 

CTFs  are  usually  divided  into  two  types:  attack- 
defense,  in  which  teams  run  services  on  a  shared  network 
and  compete  to  compromise  or  disrupt  others’  services 
while  keeping  their  own  services  available,  and  jeopardy- 
style  CTFs,  in  which  teams  solve  puzzle -like  challenges. 

The  puzzles  in  jeopardy-style  CTFs  come  in  many  dif¬ 
ferent  flavors;  some  common  types  are: 

Reverse  Engineering  Obfuscated  programs  that  must 
be  reverse  engineered  to  reveal  a  flag. 

Pwnables  Intentionally  vulnerable  programs  that  can  be 
exploited  to  obtain  a  flag. 

Crypto  Weak  or  poorly  implemented  cryptography; 
generally  the  flag  is  hidden  in  an  encrypted  message 
that  must  be  decrypted. 

Web  A  web  site  with  some  kind  of  vulnerability  (e.g., 
SQL  injection  or  cross  site  scripting)  that  can  be  ex¬ 
ploited  to  reveal  a  flag. 

These  categories  are  not  comprehensive  but  they  pro¬ 
vide  a  sense  of  the  range  of  challenges  that  are  avail¬ 
able.  AutoCTF  focuses  on  just  one  of  these  categories, 
pwnables,  by  injecting  exploitable  vulnerabilities  into  a 


small  source  program.  Since  many  different  vulnerabil¬ 
ities  can  be  added  to  a  given  program,  this  allows  many 
substantially  different  challenges  to  be  created  from  the 
same  initial  program. 

Although  CTF  challenges  are  fun,  engaging  and  gen¬ 
erally  thought  to  be  a  good  vehicle  for  cybersecurity  ed¬ 
ucation,  they  are  currently  very  expensive  in  terms  of 
human  time  and  effort  that  must  be  expended.  Creat¬ 
ing  a  good  pwnable  involves  many  labor-intensive  steps: 
one  must  write  a  small  program  that  contains  an  inten¬ 
tional  vulnerability  (and,  ideally,  no  other  bugs),  assess 
its  difficulty,  create  a  sample  solution,  and  test  it  to  make 
sure  the  creator’s  assessment  of  the  challenge  difficulty 
is  roughly  correct.  All  of  these  steps  take  time  and,  more 
importantly,  expertise. 

To  get  a  sense  of  the  cost  (in  US  dollars)  of  chal¬ 
lenge  creation,  we  examined  the  contracts  awarded  by 
DARPA  to  create  challenges  for  the  Cyber  Grand  Chal¬ 
lenge  (CGC).  In  particular,  after  consulting  with  one  of 
the  CGC  organizers,  we  focused  on  the  contract  lITSl 
awarded  to  Kaprica  Security,  Inc,  since  the  other  contrac¬ 
tors  performed  additional  tasks  beyond  challenge  cre¬ 
ation.  Since  Kaprica  was  awarded  $1.9  million  and  cre¬ 
ated  121  challenges  for  CGC,  we  can  roughly  approxi¬ 
mate  the  cost  of  a  challenge  at  about  $15,000.  The  au¬ 
thors’  own  estimate  for  the  cost  to  create  an  8-challenge 
CTF,  the  2013  MIT  LL  CTF  (TU),  is  even  higher.  Chal¬ 
lenges  for  this  event  and  others  based  on  the  same  infras¬ 
tructure  cost  about  $25,000  each  to  design,  implement, 
and  test.  The  reason  for  this  higher  cost  is  that  these 
were  attack-defend  CTFs,  which  have  more  complicated 
moving  parts  to  get  working  correctly. 

The  high  cost  of  challenge  creation  is  usually  hid¬ 
den,  as  many  challenges  are  created  by  expert  volun¬ 
teers  in  their  spare  timeQ  Unfortunately,  this  means 
that  organizations  with  less  expertise  and  resources  (or 
fewer  connections)  find  it  difficult  to  hold  CTF  compe¬ 
titions.  In  addition,  competitors  often  produce  write-ups 
of  their  solutions,  so  challenges  can  rarely  be  reused  be¬ 
tween  events.  These  factors  mean  that  controlled,  fo¬ 
cused  CTFs  cannot  currently  be  run  at  the  scale  or  fre¬ 
quency  that  we  might  wish. 

2.2  LAVA 

A  cheap  and  plentiful  source  of  bugs  in  programs  is  not 
only  useful  for  CTF  competitions;  these  automatically 
generated  corpora  can  also  be  used  to  evaluate  and  com¬ 
pare  automated  bug-finding  techniques  such  as  fuzzers, 
symbolic  execution  engines,  and  static  analysis  tools.  In 
prior  work  hd,  we  built  a  system  for  Large-scale,  Auto- 

^or  example,  challenges  for  the  NYU-run  CSAW  finals  are  created 
partly  by  students  and  partly  by  soliciting  challenges  from  experts  in 
the  security  industry. 
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unsigned  int  lava_val  =  0; 
void  foo(FILE  *f)  { 
char  x [16] ; 
fread(x,  16,  1,  f) ; 

//  DUA 

lava_val  =  * (unsigned  int  *) (&x  +  4); 

> 

void  bar(char  *baz)  { 
printf  ("Value  is  °/0s\n"  , 

//  Attack  point 
baz  +  lava_val 

*  (lava_val  ==  0x6176616c) ) ; 

> 

Figure  1 :  An  example  of  a  bug  inserted  by  our  original 
LAVA  system.  Although  it  is  obvious  that  the  pointer 
baz  will  go  out  of  bounds  when  the  trigger  condition  is 
met,  it  is  highly  unlikely  the  bug  will  be  exploitable. 

mated  Vulnerability  Addition  (LAVA).  LAVA  adds  mem¬ 
ory  corruption  bugs  to  C  source  code;  each  bug  generated 
comes  with  a  triggering  input  that  serves  as  proof  that  the 
bug  is  real.  Because  AutoCTF  builds  on  LAVA,  we  will 
briefly  describe  the  system  and  its  capabilities  and  limi¬ 
tations  here. 

LAVA  begins  with  a  C  source  program  and  an  input 
to  that  program.  In  order  to  add  bugs  to  the  program,  it 
finds  portions  of  the  input  that  are  currently  unused  and 
can  be  subverted  to  cause  memory  corruption  errors  in 
the  program.  This  data  must  be  Dead  (i.e.,  it  does  not  in¬ 
fluence  control  flow).  Uncomplicated  (not  significantly 
modified  from  the  input),  and  Available  somewhere  in 
the  program;  we  call  such  data  a  DUA.  DUAs  can  then  be 
used  to  trigger  memory  safety  violations  anywhere  along 
the  path  the  program  takes  on  the  original  input.  The 
site  where  a  DUA  is  used  to  trigger  a  bug  is  called  an 
attack  point ;  in  its  original  form,  LAVA  attacked  pointer 
arguments  to  functions  by  adding  the  DUA  to  the  pointer 
value  to  cause  it  to  go  out  of  bounds.  The  pointer  addi¬ 
tion  is  guarded  by  a  comparison  with  a  trigger  value  so 
that  the  bug  only  manifests  for  a  single,  precisely  cho¬ 
sen  input.  An  example  of  a  bug  injected  by  the  original 
LAVA  system  can  be  seen  in  Figure  [I] 

3  Approach 

3.1  Injecting  Exploitable  Bugs 

In  the  first  iteration  of  the  LAVA  system,  the  injected 
bugs  would  reliably  crash  the  program  (by  corrupting 
pointers)  but  were  not  exploitable.  Modifying  LAVA 
to  be  able  to  inject  CTF  challenges  that  are  exploitable 


data_flow[0]  =  * (unsigned  int  *) input; 
char  *ip  =  . . . ; 

if  (0x54494246  ==  data_f low [0] )  { 

_ asm _ ( 

"mov  "/oO,  7°/0rsp  \n" 

"ret" 


Figure  2:  Example  of  direct  stack  pointer  corruption 

data_flow[0]  =  * (unsigned  int  *)ip; 

data_flow[l]  =  * (unsigned  int  *) input; 

data_flow[2]  =  * (unsigned  int  *)ip2; 

string [string_pos 

+  (0x52544144  ==  data_f low[0] ) 

*  data_f low [1] ] 

=  *ip 

+  (0x52544144  ==  data_f low [0] ) 

*  data_f low [2] ; 

Figure  3:  Example  of  controlled  relative  memory  write 

was  a  significant  effort.  We  needed  to  introduce  new  bug 
types  that  would  allow  memory  corruption,  and  also  in¬ 
formation  leaks  to  allow  bypassing  of  library  ASLR. 

For  each  vulnerability  hypothesized  by  LAVA,  we 
manually  analyzed  the  corresponding  source  code 
change  to  determine  if  the  resulting  program  was  or  was 
not  exploitable.  To  ease  this  analysis,  we  stored  details 
about  each  insertable  bug  in  a  SQL  database.  The  LAVA 
user  could  then  select  a  particular  bug  to  insert  based  on 
any  criteria  they  chose,  from  source  line  number  distance 
to  observed  temporal  distance  in  the  original  trace. 

We  added  two  types  of  program  state  corruption  bugs: 
direct  stack  pointer  corruption  (Figure  [2])  and  controlled 
relative  memory  writes  (Figure[3]>.  The  first  type  allows  a 
user  of  the  modified  program  to  pivot  the  stack  pointer  to 
a  user-controlled  buffer.  The  second  allows  them  to  write 
user-controlled  bytes  to  a  user-controlled  offset  from  an 
existing  pointer  in  the  program,  typically  on  the  stack  or 
the  heap.  Additionally,  we  added  a  leak  of  a  variable 
address  enabling  ASLR  bypass  (Figure^. 

Unfortunately,  the  clang  tooling  infrastructure  that 
LAVA  is  based  on  is  not  built  for  synthesizing  complex 
additions  to  the  abstract  syntax  tree,  making  it  difficult 
to  prescribe  the  injection  pattern  for  a  new  class  of  bug. 
To  address  this  problem,  a  key  element  of  the  modifica- 
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data_flow[0]  =  * (unsigned  int  *)keystart; 

printf  ("RECALL  */,lu  °/„s  °/„lu  70s\n"  , 
key->len, 
key->str , 

(0x59465567  ==  data_f low [0] ) 

?  &(recall->len) 

;  recall->len, 
recall->str) ; 

Figure  4:  Example  of  leaking  a  heap  address 

tions  to  LAVA  is  an  embedded  domain-specific  language 
(DSL)  in  our  C++  code  (Ligure  [5j  allowing  rapid  con¬ 
struction  of  new  source  code  changes.  This  small  bit  of 
engineering  has  made  it  much  easier  to  stitch  together 
existing  source  code  variables  with  new  code  such  as  the 
expressions  in  Ligure  [3] 

3.2  Natural  Dataflow 

LAVA  bugs  require  access  to  the  DUA  at  the  attack  point 
in  order  to  compare  the  DUA  with  a  magic  value  and 
trigger  the  bug.  Because  the  DUA  may  not  be  in  scope 
at  the  attack  point,  LAVA  must  introduce  some  form  of 
dataflow  to  make  the  DUA  or  a  copy  of  it  available  at  the 
time  it  is  needed.  As  shown  in  Ligure[I|  the  first  iteration 
of  LAVA  accomplished  this  by  copying  the  DUA  into 
a  global  variable  lava_val  and  accessing  lava_val  at 
the  attack  point.  In  order  to  support  multiple  bugs  in 
the  same  program,  this  approach  was  extended  to  use  a 
global  array  where  each  entry  functioned  as  a  lava_val 
for  a  different  bug.  To  allow  injecting  bugs  into  dynamic 
shared  objects,  we  added  helper  methods  lava_set  and 
lava_get,  which  copy  and  read  the  DUA’s  value  to  and 
from  the  global  array. 

We  were  concerned  that  a  hypothetical  attacker  could 
identify  LAVA  bugs  by  focusing  solely  on  calls  to  these 
helper  functions  and  bypass  analyzing  the  rest  of  the  pro¬ 
gram.  To  address  this  issue,  we  developed  an  approach 
that  better  integrated  LAVA’s  dataflow  with  the  original 
program:  In  the  program’s  main  function,  we  declare  an 

LIf (MagicTest (bug) .render () ,  { 

LAsm({  UCharCast ( 

LStr (buffer  ->ast_name) )  + 
LDecimal (buf f er  ->selected. low) 

{  "mov  10,  */,7,rsp" ,  "ret"  } 

) 

}) 

Ligure  5:  LAVA  injection  DSL  code  for  bug  in  Ligure  [2] 


integer  array  called  data_f  low.  We  then  modify  all  non¬ 
library  function  signatures  to  include  an  additional  first 
argument  of  int*  data_f  low,  and  modify  all  function 
calls  to  include  a  pointer  to  the  data_f  low  array  as  a  first 
argument.  Linally,  as  shown  in  Ligure  [2j  Ligure  [3]  and 
Ligure  [4]  we  modify  the  injected  code  at  the  DUA-site  to 
copy  the  DUA  into  the  data_f  low  array  and  modify  the 
injected  code  at  the  attack  point  to  reference  this  array 
instead  of  calling  lava_get.  This  approach  eliminates 
the  need  for  a  global  data  structure  and  the  lava_get 
and  lava_set  helper  functions. 


3.3  Chaff  Bugs 

Releasing  multiple  versions  of  a  program  with  different 
injected  bugs  can  easily  fall  victim  to  binary  compar¬ 
ison  tools,  which  try  to  identify  changes  between  two 
versions  of  a  binary.  The  problem  stems  from  the  fact 
that  our  injected  code  is  small  when  compared  to  the  rest 
of  the  program’s  codebase,  so  comparison  quickly  yields 
the  injected  code.  Using  differential  analysis  to  discover 
bugs  is  a  well  known  technique  fl2l  and  several  tools 
have  been  developed  to  facilitate  this  l23l.  To  prevent 
this  easy  win  we  inject  chaff  bugs  into  the  targets  before 
releasing  them. 

Lor  this  chaff,  we  use  non-exploitable  LAVA  bugs  of 
the  type  depicted  in  Ligure[l]injected  into  random  points 
in  the  program.  Our  non-exploitable  bugs  are  still  reach¬ 
able  via  specially  crafted  inputs  and  still  crash  the  pro¬ 
gram  by  causing  it  to  dereference  unmapped  memory. 
Because  these  bugs  are  reachable,  an  attacker  must  con¬ 
sider  the  exploitability  of  each  bug  individually.  This 
strategy  has  the  additional  benefit  of  preventing  attackers 
from  searching  for  artifacts  left  by  our  system  to  more 
quickly  find  the  bugs;  after  finding  the  artifacts,  the  at¬ 
tacker  must  still  distinguish  between  real  bugs  and  chaff 
bugs. 


4  AutoCTF 

We  designed  a  week-long  Capture  the  Llag  competi¬ 
tion  containing  eight  challenges.  Half  of  these  chal¬ 
lenges  contained  vulnerabilities  inserted  automatically 
by  LAVA  and  half  contained  vulnerabilities  inserted 
manually. 

Each  challenge  was  hosted  in  a  docker  container  based 
on  Ubuntu  16.04.  We  used  CTLd  0  as  the  scoreboard 
system.  Challenges  were  released  at  3pm  daily  from 
May  3rd  through  May  9th  as  shown  in  Ligure  [6]  and  the 
competition  ended  at  3pm  on  May  10th. 
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4.1  Base  Programs 

We  developed  two  simple  applications  in  C  that  were 
modified  to  contain  both  automatically  generated  and 
manually  inserted  vulnerabilities.  We  used  xinetd  to 
connect  the  services’  input  and  output  to  the  network. 

The  first  of  these  services,  called  blecho,  is  a 
key/value  store  layered  on  top  of  an  echo  server.  For 
each  line  of  text  sent  to  blecho,  it  either  stores  a  value, 
loads  a  value  and  prints  it,  or  ignores  it.  Values  are  stored 
in  and  retrieved  from  a  temporary  directory  in  the  filesys¬ 
tem.  This  program  is  239  lines  of  source  code  as  mea¬ 
sured  by  David  Wheeler’s  sloccount. 

The  second  service,  fifth,  is  an  interpreter  for  a 
binary-format  stack-based  programming  language  sup¬ 
porting  basic  operations:  push,  add,  print,  etc.  This  pro¬ 
gram  is  364  lines  long. 

4.2  Automated  Vulnerability  Insertion 

Using  LAVA,  two  versions  of  each  service  were  gener¬ 
ated.  One  had  a  controlled  relative  write  bug  added  and 
the  other  had  a  direct  stack  pointer  corruption  bug  added. 
These  bugs  are  of  the  type  depicted  in  Figures  [2]  and  [3] 
respectively.  Both  had  leaks  (a  la  Figure  [4}  added  to  aid 
in  defeating  address  space  layout  randomization.  To  pre¬ 
vent  competitors  from  simply  comparing  different  ver¬ 
sions  of  the  services  to  find  vulnerabilities,  we  also  used 
LAVA  to  add  different  non-exploitable  bugs  to  each  pro¬ 
gram  (see  Section  [XT|for  details). 

A  developer  had  to  create  a  short  configuration  file 
to  run  LAVA  on  blecho  and  fifth.  Once  this  config¬ 
uration  file  was  built,  LAVA  automatically  generated  a 
database  containing  25937  and  10856  injectable  vulner¬ 
abilities  in  blecho  and  fifth,  respectively.  To  select  a 
bug  to  insert,  we  queried  the  database  to  obtain  one  of 
the  desired  exploitable  type  and  instantiated  that  vulner¬ 
ability  into  source  using  LAVA.  We  inspected  the  result¬ 
ing  program  to  convince  ourselves  that  it  should  cause 
the  intended  effect  without  additional  unintended  side  ef¬ 
fects.  Finally,  to  confirm  that  a  vulnerability  was  indeed 
exploitable,  we  manually  created  an  exploit  for  all  but 
one  of  the  challenges  before  releasing  them  to  competi¬ 
tors.  The  remaining  challenge  was  solved  by  the  winning 
team. 

The  time  required  to  add  enough  exploitable  bugs  and 
chaff  to  a  base  program  as  well  as  vet  the  output  ade¬ 
quately  so  that  it  could  be  used  as  a  challenge  was  about 
an  hour.  Some  parts  of  this  process  could  be  speeded 
up,  including  database  bug  selection.  However,  this  hour 
estimate  does  not  include  the  time  required  to  verify  ex- 
ploitability,  which  was  more  in  the  several  hours  range. 
Speeding  up  this  aspect  will  be  a  big  focus  of  future  ef¬ 
forts.  Note,  however,  that  it  is  common  to  both  auto¬ 


mated  and  manual  bug  insertion. 


4.3  Manual  Vulnerability  Insertion 

We  released  four  challenges  containing  vulnerabilities 
that  were  manually  inserted  into  our  two  base  programs, 
blecho  and  fifth. 

The  first  vulnerability  inserted  into  blecho  re¬ 
moved  logic  that  prevented  keys  from  containing  non- 
alphanumeric  characters.  Since  the  key  was  used  as  a 
filename,  removing  this  logic  allowed  competitors  to  use 
a  path  traversal  attack  to  read  a  flag.  The  second  vulner¬ 
ability  added  to  blecho  was  a  controlled  relative  write, 
similar  to  one  of  the  LAVA  generated  vulnerabilities,  but 
with  a  much  smaller  range  of  possible  addresses  to  write 
to.  This  vulnerability  allowed  corruption  of  blecho’s 
storage  directory,  which  in  turn  allowed  arbitrary  file 
reads. 

The  two  vulnerabilities  inserted  into  fifth  added  a 
stack  overflow  and  a  use-after-free.  Exploitation  of  either 
allowed  full  control  of  execution. 

The  time  required  to  manually  insert  a  bug  varied  from 
a  few  tens  of  minutes  to  several  hours.  This  variance  is 
interesting  and  is  explained  by  three  factors.  First,  if  the 
original  author  of  the  base  program  was  adding  a  bug, 
this  took  much  less  time  since  that  individual  already  un¬ 
derstood  program  function,  control-flow,  and  data  struc¬ 
tures  deeply.  Second,  if  the  insertion  was  very  early  in 
the  program  execution  this  was  easiest  since  almost  none 
of  the  progam  needed  to  be  understood.  Third,  if  the 
bug  type  was  to  be  subtle,  involving  limited  but  adequate 
control  of  a  pointer  that  could  overwrite  sensitive  pro¬ 
gram  data,  then  that  sort  of  insertion  necessarily  required 
deep  knowledge  of  the  program  and  took  the  longest  to 
get  right.  By  contrast,  auto-injected  bugs  using  LAVA, 
while  not  always  subtle,  were  always  of  the  order  of  an 
hour  to  create. 


5  Results 
5.1  Event 

We  ran  AutoCTF  over  the  course  of  a  week  for  four 
university  security  clubs  in  May  of  2017.  Three  teams 
solved  at  least  one  challenge,  with  the  winning  team 
solving  four  of  the  eight  challenges.  Four  teams  were  ini¬ 
tially  registered,  from  four  separate  universities,  but  one 
team  dropped  out  and,  generally,  participation  was  low 
due  to  conflicts  with  final  exams  and  projects.  For  future 
iterations  of  AutoCTF,  we  will  choose  a  better  time  slot. 
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Date 

Challenge 

Created  by 

Vulnerability  type 

May  3 

blecho  day  1 

LAVA 

Controlled  relative  write  (Figure  |3b 

May  4 

blecho  day  2 

LAVA 

Direct  stack  pointer  corruption  (Figure  |2j) 

May  6 

blecho  day  4 

Human 

Path  traversal 

May  8 

blecho  day  6 

Human 

Limited  controlled  relative  write 

May  3 

fifth  day  1 

Human 

Stack  overflow 

May  5 

fifth  day  3 

Human 

Use-after-free 

May  7 

fifth  day  5 

LAVA 

Controlled  relative  write  (Figure  pi) 

May  9 

fifth  day  7 

LAVA 

Direct  stack  pointer  corruption  (Figure  I2I) 

Figure  6:  Schedule  and  description  of  each  vulnerability 


5.2  Interviews 

We  interviewed  six  of  the  players  to  get  their  feedback 
on  the  event.  The  main  conclusion  we  drew  was  that  the 
challenges  were  probably  too  difficult  for  a  small  event, 
as  some  of  the  participants  were  fairly  new  to  the  CTF 
world. 

The  participants  had  somewhat  conflicting  opinions  on 
the  reuse  of  base  programs.  Some  said  they  enjoyed  the 
repetition,  as  it  meant  they  could  build  on  their  reverse 
engineering  experience  from  previous  iterations  of  each 
program.  Unfortunately,  repetition  inherently  leads  to 
less  variety  in  challenges,  which  a  few  participants  dis¬ 
liked.  One  said  that,  while  the  reverse  engineering  of 
each  challenge  iteration  was  easier  and  faster,  they  did 
not  enjoy  the  task  of  transcribing  notes  from  one  IDA 
Pro  database  to  another.  All  said  they  would  play  if  the 
event  were  held  again  with  different  bugs  in  the  same 
base  programs.  It  is  worth  noting  that  the  team  with  the 
most  solves  seemed  to  prefer  one  of  the  base  programs 
(fifth),  which  constituted  three  of  their  four  solves.  In 
interviews,  it  became  clear  that  they  had  invested  a  fair 
amount  of  RE  in  that  program  and  had  insufficient  re¬ 
sources  to  spend  as  heavily  on  blecho.  This  seems  an 
interesting  aspect  to  investigate  in  future  AutoCTFs. 

Participants  were  also  split  on  whether  it  was  more  dif¬ 
ficult  to  find  the  bugs  or  to  exploit  the  program.  One 
player  we  interviewed,  who  had  significant  experience 
playing  CTFs  (he  had  played  in  more  than  10  events), 
thought  that  due  to  the  base  program  reuse  the  diffi¬ 
culty  was  almost  entirely  in  exploitation,  and  even  sug¬ 
gested  it  as  a  way  to  train  exploit  development  skills 
independently  from  reverse  engineering  skills.  On  the 
other  hand,  at  least  one  player  was  fairly  stymied  by 
the  chaff — although  he  found  the  injected  chaff  “fairly 
transparent,”  he  was  not  able  to  determine  which  were 
exploitable,  and  commented  that  when  there  are  many 
crashes  but  few  are  exploitable  it  can  be  “demotivating.” 

One  repeated  negative  comment  was  that,  especially  to 
a  human  reverse  engineer,  the  LAVA  magic  value  com¬ 
parisons  strongly  stand  out  (Figure  [3]  line  8).  Further, 
few  teams  seemed  particularly  slowed  by  chaff  bugs;  in 


interviews  they  indicated  that  they  were  fairly  transpar¬ 
ently  not  exploitable.  These  LAVA  injection  artifacts  and 
deficiencies  will  be  an  important  area  for  future  work. 


5.3  Discussion 

These  challenges  were  vastly  easier  and  cheaper  (in 
terms  of  time)  to  create  than  challenges  that  some  of  the 
authors  had  made  in  past  CTFs,  although  verifying  ex- 
ploitability  still  took  significant  time.  While  LAVA  as¬ 
sists  in  this  effort  by  generating  an  input  that  triggers 
the  bug,  transforming  such  a  crashing  input  into  a  work¬ 
ing  exploit  is  always  an  involved  task.  Lurther  reducing 
exploitability  verification  time  should  be  possible  and  is 
an  ongoing  effort.  This  is  an  area  where  CTL  exploita¬ 
tion  diverges  from  the  real  world,  as  we  think  that  the 
ratio  between  time  spent  on  vulnerability  discovery  and 
exploitation  tends  more  towards  discovery  in  real-world 
programs,  which  are  usually  much  larger  than  a  few  hun¬ 
dred  lines. 

We  believe  that  in  the  future,  given  a  small  stable  of 
base  programs,  that  we  could  easily  run  another  Au- 
toCTF  event  with  little  effort. 

In  terms  of  player  experience,  we  found  that  the  chal¬ 
lenges  were  probably  more  difficult  than  we  had  antic¬ 
ipated.  For  example,  at  NYU,  we  saw  15-20  students 
participate  on  the  first  day;  however,  almost  all  of  these 
students  were  very  new  to  CTFs  and  security  in  general, 
and  all  but  two  (who  were  the  most  experienced  at  play¬ 
ing  CTFs)  dropped  out  after  that  first  day.  One  interest¬ 
ing  nuance  here  is  that  much  of  the  difficulty  was  in  the 
exploitation  phase.  The  factors  that  influence  how  chal¬ 
lenging  a  bug  is  to  exploit  include  things  like  the  size  of 
the  binary  (and  hence  the  availability  of  ROP  gadgets) 
and  what  exploit  mitigations  are  turned  on.  These  are 
not  under  the  direct  control  of  LAVA  right  now,  and  so 
for  future  CTLs  we  may  need  to  extend  the  system  fur¬ 
ther  to  exert  control  over  these  features  and  thereby  tune 
the  difficulty  more  precisely. 
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6  Limitations 

AutoCTF  has  three  main  limitations:  vulnerability  types, 
LAVA-required  application  source  editing,  and  exploita¬ 
tion  difficulty. 

The  LAVA  system,  when  we  began  this  work,  was  able 
to  inject  memory  corruption  bugs.  Thus,  it  is  hardly  sur¬ 
prising  that  the  class  of  CTF  challenges  automatically 
created  for  AutoCTF  involves  out-of-bounds  reads  and 
writes  on  the  stack  and  heap.  As  discussed  in  Section [3] 
controlled  pointer  writes  and  printf-based  read  disclo¬ 
sures  were  identified  as  fairly  straightforward  LAVA  bug 
type  extensions  that  would  nevertheless  provide  suffi¬ 
cient  offensive  power  for  AutoCTF  players  to  practice 
modern  exploitation  techniques.  However,  this  means 
other  kinds  of  exploitable  bugs,  both  simple  and  com¬ 
plicated,  are  not  presently  within  its  repertoire.  For  in¬ 
stance,  LAVA  cannot  inject  any  of  the  following  for  Au¬ 
toCTF:  directory  traversal,  use-after  free,  and  more  gen¬ 
eral  read  disclosures.  These  bug  types  all  seem  possible 
with  LAVA,  and  we  have  begun  to  think  concretely  about 
how  we  might  implement  them.  Other  bug  types  such  as 
side  channels,  crypto  and  logic  flaws  seem  more  funda¬ 
mentally  out  of  reach.  Our  intuition  is  that  this  is  actually 
because  they  are  rather  broad  and  ill  defined;  implement¬ 
ing  an  exploitable  bug  type  in  LAVA  requires  a  precise 
formulation.  It  is  possible  that  there  are  specific  classes 
of  bugs  within  these  seemingly  trickier  categories  that 
could  be  part  of  AutoCTF’s  future,  once  clearly  speci¬ 
fied. 

LAVA  bugs,  additionally,  are  not  always  injectable  in 
a  freshly  written  challenge  program.  One  cause  for  this 
is  a  mismatch  between  the  syntactic  constructs  recog¬ 
nized  by  LAVA  in  terms  of  Clang’s  AST  matchers  and 
those  found  in  the  source  as  written  by  a  programmer. 
For  instance,  LAVA  identifies  attack  points  for  injecting 
exploitable  bugs  as  follows. 

1.  A  memory  access  attack  point  is  an  assignment  in 
which  the  left-hand-side  is 

(a)  a  pointer  dereference  such  as  *p= '  \0 1 ,  or 

(b)  an  index  into  an  array  such  as  a  [i]  =7 

2.  A  printf  attack  point  is  a  printf  call  containing 
an  integer  argument,  e.g.  printf  ("%d\n" ,  x) 

These  are  the  only  places  in  a  program  source  where 
LAVA  can  inject  code  such  that  a  memory  corruption  or 
information  leak  can  manifest  there.  If  none  of  these  con¬ 
structs  appear  in  a  program,  LAVA  will  not  find  any  lo¬ 
cations  where  it  can  attempt  to  add  a  bug.  If  a  program 
doesn’t  use  arrays  or  pointers,  LAVA  can’t  add  bugs  to 
it.  This  means  code  may  have  to  be  partly  re-written  in 
terms  of  these  constructs  for  it  to  be  usable  in  AutoCTF. 


Additionally,  LAVA  will  be  unable  to  inject  bugs  into 
a  program  that  uses  data  in  such  a  sparing  way  that  there 
are  no  or  too  few  DUAs.  This  can  be  assessed  by  run¬ 
ning  LAVA  noting  how  many  DUAs  it  locates,  and  then 
modifying  the  program  to  increase  that  number.  LAVA 
locates  DUAs  as  tainted  (attacker-controlled)  data  at  par¬ 
ticular  points  in  a  program  trace  that  satisfy  the  following 
requirements.  A  DUA  is 

1 .  at  least  as  big  as  a  machine  pointer 

2.  not  used  to  decide  many  previous  branches 

3.  not  a  complicated  function  of  input  bytes 

One  can  increase  the  number  of  DUAs  available  in  a 
program  only  with  a  detailed  understanding  of  the  cur¬ 
rent  data  flow.  For  instance,  if  one  knows  where  data 
is  first  read  in,  one  might  introduce  additional  buffers  in 
which  that  data  is  needlessly  stored,  and  LAVA  will  find 
and  use  them  to  create  bugs.  Note  that  the  number  of  po¬ 
tential  bugs  injectable  by  LAVA  is  roughly  linear  in  the 
product  of  the  number  of  DUAs  and  the  number  of  attack 
points,  so  we  want  to  make  both  numbers  large. 

The  bugs  injected  by  LAVA  also  currently  have  an  eas¬ 
ily  recognizable  trigger — a  four-byte  “magic  value”;  as 
seen  in  Section  [5]  several  different  players  noticed  this 
feature  and  found  it  unrealistic.  Although  our  chaff  injec¬ 
tion  prevented  this  from  being  used  as  a  shortcut  to  figure 
out  where  the  exploitable  bug  was  injected,  chaff  can  also 
be  frustrating  to  players.  We  hope  to  create  more  natural 
triggers  in  the  future  by  splitting  up  the  comparison  into 
multiple  smaller  comparisons,  applying  transformations 
to  the  DUA  before  the  trigger  comparison,  and  integrat¬ 
ing  it  more  tightly  with  the  program’s  existing  state  and 
data  structures. 

A  final  limitation  of  AutoCTF  is  that,  given  a  LAVA 
injected  bug,  exploitability  must  be  verified  manually. 
This  puts  a  lower  bound  on  the  time  to  auto-generate  a 
challenge  that  is  higher  than  we  would  like.  That  is,  cre¬ 
ating  a  new  challenge  with  LAVA  might  take  fifteen  min¬ 
utes,  but  proving  that  it  is  exploitable  may  take  several 
hours.  This  situation  sees  a  parallel  in  AutoCTF  game 
play,  where  players  observed  about  an  order  of  mag¬ 
nitude  difference  between  the  time  to  find  the  bug  and 
the  time  to  exploit  it.  Further,  note  that  players  usually 
chose  to  exploit  LAVA  bugs  via  standard  ROP  techniques 
which  entails  a  fairly  lengthy  but  bounded  exploit  devel¬ 
opment  process.  We  never  observed  a  player  choosing  to 
exploit  LAVA  bugs  to  corrupt  application-specific  data 
such  as  length  fields  and  directory  string  contents.  This 
indicates  a  bias  in  players  to  choose  exploit  strategies 
with  known  time  requirements  over  investing  in  under¬ 
standing  and  exploring  a  binary  to  find  subtle  data  attacks 
that  might  be  much  easier  to  stage.  This  is  interesting  and 
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we  will  would  like  to  frame  experiments  to  measure  this 
bias  in  the  future.  More  importantly,  we  believe,  for  Au- 
toCTF  to  be  viable,  we  will  need  to  invest  in  analyses  that 
speed  exploit  development.  These  analyses  would  lever¬ 
age  the  advantage  that  we  know  the  exact  input  required 
to  control  a  LAVA-injected  bug  and  precisely  where  it 
manifests. 

7  Future  Work 

In  the  future,  we  would  like  to  explore  just  how  much  of 
a  CTF  can  be  automated,  in  order  to  put  CTFs  within  the 
reach  of  as  many  people  as  possible.  At  the  same  time, 
we  would  like  to  improve  the  quality  of  the  automatically 
generated  challenges  and  ensure  sufficient  diversity.  We 
discuss  several  areas  of  study  needed  to  achieve  that  goal 
here,  and  consider  what  research  might  be  enabled  by 
such  a  system. 

7.1  Bug  Diversity  and  Realism 

The  bugs  we  inject  currently  require  very  few  prerequi¬ 
sites  for  exploitation.  This  increases  the  odds  that  a  given 
bug  will  be  exploitable,  but  limits  the  types  of  bugs  we 
can  inject.  As  discussed  in  the  previous  section,  we  be¬ 
lieve  that  many  types  of  memory  safety  bugs  (both  spa¬ 
tial  and  temporal)  are  within  reach;  other  bugs,  such  as 
timing  channels  and  cryptographic  weaknesses,  will  re¬ 
quire  more  fundamental  research  in  order  to  precisely 
specify  and  add  to  an  existing  program.  This  research 
would  benefit  not  only  automatically  generated  CTFs, 
but  also  our  ongoing  attempts  to  automatically  create 
high-quality  vulnerability  corpora  for  evaluating  bug  dis¬ 
covery  tools. 

LAVA  currently  produces  an  input  to  trigger  each  bug 
it  injects,  but  not  an  exploit  for  the  bug.  Automatically 
generating  exploits  is  a  studied  academic  problem  GO  but 
remains  a  complex,  open-ended  task  in  most  cases.  With 
LAVA  we  have  a  simpler  problem  since  we  control  the 
bug  we  are  injecting  and  can  modify  the  source  or  bi¬ 
nary  code  of  the  program.  Given  our  advantages,  LAVA 
should  be  able  to  provide  more  assistance  to  the  person 
tasked  with  proving  the  exploitability  of  the  bug  or  even 
provide  the  proof  (in  the  form  of  a  working  exploit)  it¬ 
self. 

7.2  Improving  Automation 

Aside  from  proving  exploitability,  we  still  require  human 
intervention  to  create  the  base  programs  and  to  assign 
difficulty  scores.  We  believe  that  even  these  steps  might 
be  automated,  however. 

Creating  small  challenges  by  hand  is  not  insurmount¬ 
ably  difficult,  but  it  does  pose  some  risks:  unintended 


exploitable  bugs  introduced  in  a  base  program  will  be 
present  in  every  challenge  derived  from  it,  which  could 
allow  a  large  number  of  problems  to  be  solved  in  the 
same  way.  Instead  of  crafting  the  base  programs  by  hand, 
we  could  trawl  GitHub  to  look  for  small  C  programs  that 
read  from  stdin  and  write  to  stdout.  Most  such  pro¬ 
grams  will  be  too  large  to  be  reasonably  reverse  engi¬ 
neered  during  a  CTF,  but  we  may  be  able  to  use  tech¬ 
niques  such  as  program  slicing  EH  to  make  the  pro¬ 
grams  a  more  manageable  size.  The  binary  comparison 
problem  described  in  Section  [3]  arises  in  a  different  form 
here:  since  the  base  programs  are  widely  available,  par¬ 
ticipants  might  be  able  to  obtain  them  and  compare  them 
with  our  buggy  version  to  locate  the  added  bug.  Our 
chaff  technique  should  work  here  as  well,  but  we  also 
plan  to  investigate  techniques  such  as  binary  stirring  EOS 
to  help  prevent  comparison. 

A  thornier  challenge  is  difficulty  estimation.  As  ex¬ 
emplified  in  this  CTF,  where  we  inadvertently  made 
the  challenges  too  hard  for  novice  players,  even  human 
judgement  is  not  always  very  accurate  in  estimating  the 
difficulty  of  a  challenge.  Difficulty  is  also  influenced 
not  only  by  source-level  features  of  the  bug  injected,  but 
by  features  of  the  binary  program  (e.g.,  the  availability 
of  ROP  gadgets)  and  the  runtime  environment  (e.g.,  ex¬ 
ploit  mitigations  such  as  ASLR).  By  running  more  and 
larger-scale  automatically  generated  CTF  competitions, 
we  hope  to  identify  features  of  programs,  bugs,  and  en¬ 
vironments  that  contribute  to  the  difficulty  of  an  exploita¬ 
tion  challenge  and  use  those  features  to  automatically  as¬ 
sign  point  values  to  the  generated  challenges. 


7.3  Researching  Human  Vulnerability  Dis¬ 
covery 

Although  automated  tools  have  made  great  strides  in  re¬ 
cent  years  ED,  humans  still  hold  an  advantage  when  it 
comes  to  finding  deep,  subtle  vulnerabilities.  However, 
just  how  humans  go  about  finding  security  vulnerabilities 
is  not  well  understood,  in  part  because  it  is  hard  to  carry 
out  large-scale  controlled  experiments.  CTFs  provide  an 
opportunity  to  study  security  practice  in  a  controlled  en¬ 
vironment.  In  the  future,  we  would  like  to  use  a  record- 
replay  system  to  record  the  actions  of  players  for  later 
study.  Such  data  collection  would  allow  us  to  understand 
how  strong  players  do  their  work,  examine  weaker  play¬ 
ers’  behavior  in  order  to  help  them  improve,  and  under¬ 
stand  more  about  the  underlying  vulnerabilities  and  how 
players  find  and  exploit  them.  We  believe  this  research 
has  the  potential  to  not  only  improve  cybersecurity  ed¬ 
ucation  but  also  generate  insights  that  can  be  applied  to 
improving  automated  bug-finding  tools  as  well. 
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8  Related  Work 


Automatic  problem  generation  for  educational  assess¬ 
ment  is  an  active  research  area  (HOSE]  ED,  especially 
with  the  rise  of  online  education  and  computer  based 
learning.  Most  prior  work  focuses  on  more  traditional 
educational  environments,  but  some  work  has  been  done 
in  applying  these  ideas  to  computer  security  and  CTFs  in 
particular. 

We  are  not  the  first  to  recognize  the  value  of  automat¬ 
ically  generating  CTF  challenges.  We  are,  however,  the 
first  to  be  able  to  inject  bugs  into  existing  programs  in¬ 
stead  of  using  substitution-based  approaches  or  domain 
specific  languages.  Previous  approaches  give  the  chal¬ 
lenge  author  more  control  over  the  generated  challenge 
in  exchange  for  limiting  the  diversity  of  the  challenges 
generated  per  template. 

The  first  work  in  automatic  challenge  generation  was 
done  by  Burket  et  al  Q.  Their  event,  picoCTF  (4),  is  tar¬ 
geted  at  middle  and  high  school  students,  so  the  difficulty 
must  be  low.  Because  of  this,  they  focus  on  challenges 
in  categories  other  than  memory  corruption  attacks.  For 
example,  they  might  automatically  change  the  key  of  a 
simple  cipher  on  a  per-team  basis. 

Their  competition  also  has  a  cash  prize,  and  thus 
cheating  is  a  real  threat.  Therefore,  their  work  focuses  on 
catching  teams  who  are  sharing  answers.  Their  approach 
is  to  template  the  challenges  and  then  use  a  system  to 
automatically  fill  out  the  templates  on  a  per  team  basis 
with  a  unique  flag  and  other  per  team  parameters.  An¬ 
other  artifact  of  the  cash  prize  is  that  they  need  to  ensure 
a  consistent  difficulty  between  the  generated  challenges. 

Building  on  the  work  by  Burket  et  al,  Gabor  Szarka 
developed  Blinker  ffiD,  a  domain  specific  language  to 
describe  challenges.  The  majority  of  the  changes  from 
the  previous  work  is  a  focus  on  binary  challenges  using 
a  custom  LLVM  toolchain.  Another  tool  Blinker  pro¬ 
vides  is  an  automation  framework  for  creating  network 
forensics  challenges.  Currently,  the  author  is  running  an 
online  capture  the  flag  event  using  his  framework.  There 
have  been  no  published  results  for  this  event  as  of  the 
writing  of  this  paper. 

Pewny  and  Holz  created  a  similar  system  to 
LAVA  called  EvilCoder  lfl6l.  which  subverts  attacker- 
controlled  data  to  remove  security  checks  in  source  code. 
Because  EvilCoder  uses  a  static  approach,  it  does  not 
have  the  ability  to  easily  generate  triggering  inputs  to 
prove  the  existence  of  its  bugs.  LAVA  bugs  come  with 
a  triggering  input  and  thus  give  the  LAVA  user  a  head 
start  in  demonstrating  their  exploitability. 


9  Conclusion 

This  paper  introduced  AutoCTF,  a  jeopardy-style  com¬ 
puter  security  competition  employing  automatically  gen¬ 
erated  vulnerabilities.  These  synthetic  bugs,  injected  us¬ 
ing  an  extended  version  of  the  LAVA  system,  varied  in 
type,  including  controlled  relative  writes,  read  disclo¬ 
sures,  and  stack  pointer  corruption  abilities.  Together, 
these  provided  sufficient  offensive  power  for  exploita¬ 
tion.  Teams  playing  AutoCTF  solved  challenges  involv¬ 
ing  both  LAVA  and  manually  injected  bugs  during  the 
competition,  indicating  a  rough  equivalence.  AutoCTF 
achieved  considerable  code  reuse,  with  four  buggy  ver¬ 
sions  each  of  only  two  base  programs.  Half  of  the  CTF 
was  completely  auto-generated,  making  that  portion  very 
inexpensive.  Our  experience  suggests  that,  with  some 
work  to  reduce  artifacts,  and  a  better-set  difficulty  level, 
AutoCTF  might  be  run  next  time  using  only  LAVA  bugs, 
dramatically  reducing  cost.  In  the  future,  we  imagine 
AutoCTF  might  be  set  up  to  run  virtually  without  hu¬ 
man  intervention  and  provide  an  inexhaustible  training 
ground  for  those  wanting  to  practice  vulnerability  dis¬ 
covery  and  exploit  development. 
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